fine-tuning epoch
Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning
Slack, Dean L., Moubayed, Noura Al
Most defences target the pre-training stage, leaving memorisation during fine-tuning--especially for domain adaptation and instruction tuning--poorly understood. We fine-tune Pythia, Llama3, and Mistral models spanning 1.4B-70B parameters on common evaluation datasets and track verbatim memorisation throughout training. We find that memorisation increases dramatically in the first few epochs, often significantly before either validation perplexity or evaluation performance is op-timised. We use a simple but effective n-gram memorisation score which reliably precedes verbatim memorisation; using it as an early-stopping criterion mitigates memorisation with minimal performance loss. Further, we introduce an n-gram-aware loss regulariser and show that it reduces memorisation across all model families tested by up to 40% while minimising evaluation performance trade-offs when compared to an existing memorisation mitigation strategy. These results yield practical, scalable insights into memorisation dynamics during language model fine-tuning.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (4 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.68)
We thank all reviewers for their constructive and valuable feedback and are delighted to receive an overall positive
Furthermore, we will integrate all changes according to your suggestions and questions. Figure 1: Evaluation on the DA VIS 2017 validation set. The connection between the inner and outer optimization is also illustrated in Algorithm 1 of the supplementary. The mitigate in line 7 refers to shortcomings and we will improve the understandability of the abstract. The fine-tuning epochs in Table 1 refer to a single update with one image. How to train your MAML.
Efficient Unlearning with Privacy Guarantees
Domingo-Ferrer, Josep, Jebreel, Najeeb, Sánchez, David
Privacy protection laws, such as the GDPR, grant individuals the right to request the forgetting of their personal data not only from databases but also from machine learning (ML) models trained on them. Machine unlearning has emerged as a practical means to facilitate model forgetting of data instances seen during training. Although some existing machine unlearning methods guarantee exact forgetting, they are typically costly in computational terms. On the other hand, more affordable methods do not offer forgetting guarantees and are applicable only to specific ML models. In this paper, we present \emph{efficient unlearning with privacy guarantees} (EUPG), a novel machine unlearning framework that offers formal privacy guarantees to individuals whose data are being unlearned. EUPG involves pre-training ML models on data protected using privacy models, and it enables {\em efficient unlearning with the privacy guarantees offered by the privacy models in use}. Through empirical evaluation on four heterogeneous data sets protected with $k$-anonymity and $ε$-differential privacy as privacy models, our approach demonstrates utility and forgetting effectiveness comparable to those of exact unlearning methods, while significantly reducing computational and storage costs. Our code is available at https://github.com/najeebjebreel/EUPG.
- North America > United States > California (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Spain > Catalonia > Tarragona Province > Tarragona (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government (1.00)
An Approach Towards Learning K-means-friendly Deep Latent Representation
Clustering is a long-standing problem area in data mining. The centroid-based classical approaches to clustering mainly face difficulty in the case of high dimensional inputs such as images. With the advent of deep neural networks, a common approach to this problem is to map the data to some latent space of comparatively lower dimensions and then do the clustering in that space. Network architectures adopted for this are generally autoencoders that reconstruct a given input in the output. To keep the input in some compact form, the encoder in AE's learns to extract useful features that get decoded at the reconstruction end. A well-known centroid-based clustering algorithm is K-means. In the context of deep feature learning, recent works have empirically shown the importance of learning the representations and the cluster centroids together. However, in this aspect of joint learning, recently a continuous variant of K-means has been proposed; where the softmax function is used in place of argmax to learn the clustering and network parameters jointly using stochastic gradient descent (SGD). However, unlike K-means, where the input space stays constant, here the learning of the centroid is done in parallel to the learning of the latent space for every batch of data. Such batch updates disagree with the concept of classical K-means, where the clustering space remains constant as it is the input space itself. To this end, we propose to alternatively learn a clustering-friendly data representation and K-means based cluster centers. Experiments on some benchmark datasets have shown improvements of our approach over the previous approaches.
Magnificent Minified Models
Harang, Rich, Sanders, Hillary
There are many ways to make a deep neural network smaller. In this paper, we focus on three categories of model size reduction: pruning, quantization, and training smaller models from scratch. Quantization means changing model parameters to lower-precision formats, like changing all 32-bit floating point parameters to 16-bit, which results in file size about half as large. Pruning deals with deleting parameters or groups of parameters (like entire neurons) from a trained model to make it smaller (often followed by a fine-tuning round of training, as done in our experiments). Parameter-level pruning (also called unstructured pruning) prunes individual parameters at a time, whereas neuron-level pruning (also called structured pruning) prunes all parameters associated with a given neuron at once. To simplify terminology across multiple methods we use the term'damage' to broadly refer to the undesired impact of removing a node or zeroing a weight on network performance. Different compression methods use different approaches to either estimate damage directly, or rank neurons or weights in order of increasing assumed damage according to some other metric that does not directly evaluate the impact on loss or performance.
L2PF -- Learning to Prune Faster
Vemparala, Manoj-Rohit, Fasfous, Nael, Frickenstein, Alexander, Moraly, Mhd Ali, Jamal, Aquib, Frickenstein, Lukas, Unger, Christian, Nagaraja, Naveen-Shankar, Stechele, Walter
Various applications in the field of autonomous driving are based on convolutional neural networks (CNNs), especially for processing camera data. The optimization of such CNNs is a major challenge in continuous development. Newly learned features must be brought into vehicles as quickly as possible, and as such, it is not feasible to spend redundant GPU hours during compression. In this context, we present Learning to Prune Faster which details a multi-task, try-and-learn method, discretely learning redundant filters of the CNN and a continuous action of how long the layers have to be fine-tuned. This allows us to significantly speed up the convergence process of learning how to find an embedded-friendly filter-wise pruned CNN. For ResNet20, we have achieved a compression ratio of 3.84 x with minimal accuracy degradation. Compared to the state-of-the-art pruning method, we reduced the GPU hours by 1.71 x.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)